Incremental plan aggregation for generating policies in MDPs

نویسندگان

Florent Teichteil-Königsbuch

Ugur Kuter

Guillaume Infantes

چکیده

Despite the recent advances in planning with MDPs, the problem of generating good policies is still hard. This paper describes a way to generate policies in MDPs by (1) determinizing the given MDP model into a classical planning problem; (2) building partial policies off-line by producing solution plans to the classical planning problem and incrementally aggregating them into a policy, and (3) using sequential Monte-Carlo (MC) simulations of the partial policies before execution, in order to assess the probability of replanning for a policy during execution. The objective of this approach is to quickly generate policies whose probability of replanning is low and below a given threshold. We describe our planner RFF, which incorporates the above ideas. We present theorems showing the termination, soundness and completeness properties of RFF. RFF was the winner of the fully-observable probabilistic track in the 2008 International Planning Competition (IPC-08). In addition to our analyses of the IPC08 results, we analyzed RFF’s performance with different plan aggregation and determinization strategies, with varying amount of MC sampling, and with varying threshold values for probability of replanning. The results of these experiments revealed how they impact the time performance of RFF to generate solution policies and the quality of those solution policies (i.e., the average accumulated reward gathered from the execution of the policies).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...

متن کامل

Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning

We describe an approach to goal decomposition for a certain class of Markov decision processes (MDPs). An abstraction mechanism is used to generate abstract MDPs associated with different objectives, and several methods for merging the policies for these different objectives are considered. In one technique, causal (least-commitment) structures are generated for abstract policies and plan mergi...

متن کامل

Hierarchical Control and Learning for Markov Decision Processes Abstract Hierarchical Control and Learning for Markov Decision Processes

This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a ...

متن کامل

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...

متن کامل

Extreme State Aggregation beyond MDPs

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp. MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Incremental plan aggregation for generating policies in MDPs

نویسندگان

چکیده

منابع مشابه

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning

Hierarchical Control and Learning for Markov Decision Processes Abstract Hierarchical Control and Learning for Markov Decision Processes

Finite-Horizon Markov Decision Processes with State Constraints

Extreme State Aggregation beyond MDPs

عنوان ژورنال:

اشتراک گذاری